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DAVIDON-BROYDEN RANK-ONE MINIMIZATION METHODS 
IN HILBERT SPACE WITH APPLICATION TO 
OPTIMAL CONTROL PROBLEMS * 

By Terry A. Straeter 
Langley Research Center 

SUMMARY 

The Davidon-Broyden class of rank -one quasi -Newton minimization methods is 
extended from Euclidean spaces to infinite dimensional real Hilbert spaces. The mem- 
bers of this class of minimization methods are distinguished by the manner in which a 
particular parameter (the step size) is chosen at each iteration. For several techniques 
of choosing the step size, conditions are found which assure convergence of the associated 
iterates to the location of the minimum of a positive definite quadratic functional. For 
those techniques, convergence is achieved without the problem associated with many other 
first-order minimization methods, namely, the computation of a one -dimensional mini- 
mum at each iteration. The application of this class of minimization methods for the 
direct computation of the solution of an optimal control problem is outlined. The perfor- 
mance of various members of the class are compared by solving a sample optimal con- 
trol problem. Finally, the sample problem is solved by other known gradient methods 
and the results are compared with those obtained with the rank-one quasi-Newton methods. 

INTRODUCTION 

In the past few years the problem of finding the location of the minimum value of,.a 
real -valued function of n real variables by numerical methods has been the subject of 
a great deal of research. (See refs. 1 to 4.) Several iterative procedures have been 
developed to solve the problem. Much of the work has been directed toward developing 
algorithms which use the function value and its gradient to locate the minimum by itera- 
tion. This type of algorithm is usually referred to as a gradient method. Historically, 
the method of steepest descent was the first such method. In order to accelerate conver- 

* The material presented herein includes information from a thesis entitled "On the 
Extension of the Davidon-Broyden Class of Rank-One, Quasi-Newton Minimization Methods 
to an Infinite Dimensional Hilbert Space With Applications to Optimal Control Problems" 
by Terry Anthony Straeter submitted to North Carolina State University, Raleigh, North 
Carolina, 1971, in partial fulfillment of the requirements for the degree of Doctor of 
Philosophy in Applied Mathematics. 



gence, the method of conjugate gradients, developed by Hestenes and Stiefel (ref. 5), was 
applied to the minimization problem by Fletcher and Reeves (ref. 4). Later various ver- 
sions of it were extended to real Hilbert spaces (refs. 6 and 7) and applied to optimal con- 
trol problems (refs. 8 and 9). Other first-order methods were developed which were 
inspired by Newton's second-order method. In 1959 Davidon (ref. 2) proposed two effec- 
tive techniques for solving the problem. The first method, given in the main body of his 
report, was modified in 1963 by Fletcher and Powell (ref. 3). This algorithm is referred 
to as the DFP method. They established that for any differentiable real-valued function, 
the method is stable; that is, the value of the function is monotonically decreasing with 
each iteration. Moreover, they showed that for a real -valued quadratic function of 
n variables, the DFP algorithm converges in a finite number of steps. In fact, at most 
n + 1 steps are needed. In 1968 Horwitz and Sarachik (ref. 10) extended the DFP method 
from an n -dimensional Euclidean vector space to an infinite -dimensional real Hilbert 
space and established convergence of the iterates when the functional to be minimized is 
quadratic. In 1970 Tokumaru, Adachi, and Goto (ref. 11) also extended the DFP algorithm 
to an infinite -dimensional real Hilbert space and gave a comparison of the DFP method, 
the steepest descent method, and the conjugate gradient method on some sample optimal 
control problems. 

The second method due to Davidon, which he called a rank-one variance method, 
was outlined in the appendix to reference 2. In reference 12 and again in reference 13, 
he published modifications of the second method and established conditions insuring its 
convergence to the minimum of a quadratic function of n variables in a finite number 
of steps, at most n + 1, and insuring the stability of the method. In 1965 Broyden (ref. 1) 
proposed a family of methods, which he called quasi-Newton, based on an arbitrary param- 
eter a (the step size). If a = 1, then under certain conditions, Broyden' s method and 
the second Davidon method are the same. In 1969 Goldfarb (ref. 14) established conver- 
gence of the iterates of a rank-one algorithm for a class of real functions of n variables 
when a is chosen by means of a linear minimization technique (that is, a one -dimensional 
search). He called the algorithm with the step size chosen by a one -dimensional minimi- 
zation "an optimal variance algorithm." 

The purpose of this paper is to extend the Davidon-Broyden family of algorithms to 
an infinite -dimensional real Hilbert space, to establish conditions guaranteeing conver- 
gence of the iterates for various algorithms in the family, and to apply the family of algo- 
rithms to optimal control problems. Also presented herein is a comparison of the rank- 
one methods with other first-order algorithms on a sample optimal control problem. 
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SYMBOLS 


,-l 


C 

f 

f,i 


self-adjoint, positive, linear operator from H to H 


inverse of A 


B( n ),y( n ) sequence of linear operators 


fixed element of H 

linear operator from H to H 

function from R n+m +l into R n 

n x m matrix with elements ( ] where i = 1, 2, 

\ 9u j/ 


j = 1, 2, . . ., m 


QU 


n x n matrix with elements where i = 1, 2, 

C/Xi 


j 1, 2, . , n 


gradient of J 


gn>S 

H 

h 


gradient J at x n and x*, respectively 


real Hilbert space 


element of H 


identity 


n,q,r 


integers 


functional defined on H 


J o, M , m , M 0 

m 0’^m’ K 


real numbers 


., n and 


, n and 
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L 


function from Rn+m+1 i n t 0 r 


j x 


_ / 9L 9L 


9L 

9x n 


9xj’ 9X2’ ' ' 

square integrable functions from (tQ,t j) into R r 
M(n),N(n) set of integers 


P(t),R(t)A 

G(t),B(t)J 

Pi 

R 


S, M, Sx, Sx^ 
s n 

t 

to^i 

U x 


u 

X,U,X 0 ,X* 

X 


x l5 x 2 ,x 


X 


n 


y n 


matrix -valued function of t 

elements of H 

set of all real numbers 

nth residual vector, element of H, defined by equations (10) 
subspaces of H 

direction of nth step, element of H 
independent variable 

initial and final values of the independent variable 
linear operator from H to H 
control variable 
elements of H 

element of H at which J is minimized 
state variables 
nth iterate, element of H 
el ement of H d efined_bv g*.- g n 
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scalars 


a n step size, a real number 

6 U change in u 

e n scalar, 1 - a n 

e-^(h) functional from H to R 

8 zero of H 

X,Xi,X2 adjoint variables 

scalar defined by equation (14) 
cr n defined by x n+ q - x n 

r n scalars defined by equations (11) 

V u gradient with respect to u 

(•,•) inner product on H 

|| • 1 1 norm on H defined as (• , and norm of operator A defined by 

1 1 A |I = inf (K: 1 1 Ax 1 1 ^K||x||> 

Abbreviations: 

conv convex hull 

dim(H) dimension of H 

inf infimum (lower bound) 

sup supremum (upper bound) 

Dots over symbols denote derivatives with respect to t. 
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PROBLEM FORMULATION FOR QUADRATIC CASE 


Consider the problem of finding the element x e H which minimizes the quadratic 
functional J:H — R given by 

J(x) = J 0 + (x,b) + |(x,Ax) (1) 

where Jq e R, the real numbers, and b is a fixed element in H, a real Hilbert space 
with inner product (•,•) and zero 8. 

In equation (1) A:H — H is a linear self-adjoint operator; thus, 

m(x,x) i (x,Ax) ^ M(x,x) (2) 


where 


M = sup 
x* 6 


(x, Ax) 
(x,x) 


m = inf 
x±6 


(x, Ax) 
(x,x) 


and where it is assumed that 0 < m £ M. Hence, II A| I = M (ref. 15) where 


||A|| = inf{K:||Ax|| £K||x||} 

Since m > 0, A - * exists (ref. 16) and A"* is also a self-adjoint operator. Moreover, 

^(x,x) (3) 

A functional J:H — R is said to be differentiable at x if there exists a bounded 
linear functional U X :H — R such that for h e H 


J(x + h) - J(x) = U x (h) + e^h) (4) 

^1 1 /o 

where — - - . — 0, as (h,h) ' c — 0 (Frechet differential). If such a functional U x 
(h,h) 1/2 

exists, then it is unique. (See ref. 17.) Moreover, by the Riesz representation theorem 
(ref. 18), for each x e H there exists a vector g e H such that (g,h) = U x (h) for all 
h e H and g is given by 
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(g,h) = 


dJ(x + th) 
dt 


t=0 


The vector g is called the gradient of the functional J. 

For the quadratic functional defined in equation (1), the gradient is given by 


g(x) = Ax + b 


(5) 


A well-known necessary and sufficient condition that x minimize J(x) as given 
by equations (1) and (2) is that g(x) = 6 where 6 denotes the zero element of H. (See 
ref. 15.) 

Hence, if x denotes the location of the minimum of the quadratic functional J 
given by equation (1), then 

x = -A -i b (6) 

Moreover, if x,h e H are such that x + h = x, then 

h = -A _1 g(x) (7) 

Of course, the equation h = -A“*g(x) is the basis for the well-known Newton-Raphson 
method for solving the operator equation g(x) =6 on a real Hilbert space. (See ref. 15.) 
As a final preliminary note, recall that if A and B are positive self-adjoint linear 
operators with (x,Ax) = (x,Bx) for all x e H, it is said that A = B. (See ref. 18.) 


DAVIDON-BROYDEN ALGORITHMS 


Let J:H — R be a differentiable functional with gradient g. Let xq e H be the 
initial estimate of the location of the minimum of J, and let be a self-adjoint lin- 

ear operator from H onto H. Moreover, let Mq = mg > 0 be such that for all x, 
mg(x,x) ^ (x,\H°)x) i Mg(x,x), that is, v(o) is strongly positive. (See ref. 18.) If J, 
the functional to be minimized, happens to be quadratic, then V^) can be viewed as an 
estimate of A“*. The quantities J(xq) and g(xg) are computed and the first (n = 0) 
and successive iterations are obtained as follows: 

(1) Let 

x* = x n - Q! n v( n )g n (8) 
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where g n denotes g^x n j and a n> the step size, is a scalar, the choice of which is dis- 
cussed later. Let 

s n = -v( n )g n (9) 

and compute J(x*) and g(x*) denoted by g*; if (g*,g*) = 0, a necessary condition for 
x* to be the location of the minimum, the computation is stopped. If J is a quadratic 
functional as in equations (1) and (2) and g* = 8, then x* is the location of the minimum. 
(See ref. 15.) 

(2) Compute the vector 

r n = V (n) g* - V^ n )g n + a n v( n )g n "'j 
r n = V< n >[g* - (1 - 0! n )g n J 
r n = v(n) y n - a n s n 

where 

y n = g* - Sn 

The vector r n is called the residual vector for reasons explained in a subsequent sec- 
tion. If r n = e and a n * 1, set Q! n = 1 and return to step (1). If r n = Q and a n = 1, 
then set r n = 0 and v( n+ l) = v( n ) and go to step (5). 

(3) Define the scalar r n as 

j"(y n > r n) -1 

n 

(4) Let 

v ( n +!) = v (n) + r n B (n) (12) 

where B^:H — H is defined such that for all x e H 

B (n) x =_(x,r n )r n (13) 



> (10) 


8 



(5) If J(x*) < J(xn), let x n+ i = x* and, consequently, J(x n+ i) = J(x*) and 
g n+ i = g*. Otherwise, if r n i 0, let x n+1 = x n so that J(x n+1 ) = 'J(x n ) 'and g n+ i =“g n ; 
If T n = 0, then return to step (1) and choose Q! n so that J(x*) < J^x n ^. Set n = n + 1 
and go to step (1). 

Figure 1 gives a two-dimensional illustration of the behavior of the algorithm. The 
figure depicts the level curves of a hypothetical function J, the iterates generated by the 
algorithm, the negative gradient at each iterate, and the direction of the step generated 
by equation (9) at each iterate. Notice when v(0) is the identity, then the first direc- 
tion is the negative gradient. Also note that the illustration shows that xj = X 2 ', that is, 
J(x*) for the first iteration is greater than J(x]j. Hence by step (5), X 2 = xj. How- 
ever, the computation of g* is used to compute v(2) which determines, the choice 
of S2- The point xg is then such that J^xg) shows a substantial improvement over 
J^X 2 ^. From xg, the iterates would continue until the necessary conditions for a mini- 
mum are satisfied. 


Si 


Vi 



J = c. 


a 2 s 2 


Figure 1.- Example of progress of algorithm. 
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Notice that the operator v( n+ ^:H - H given in equation (12) can be expressed as 


y(n+l) x _ v^x + ^ p i (x,p i ) p t ( 14 ) 

ieM(n) 

where x e H and M(n) = {i = n: (yi> r i) 4- 0} and 

Pi = -sgn J- 1 r (i e M(n)) 

1 l y i’ V 

and 



(i e M(n)) 


This expression for v( n+ l) is computationally more tractable than equation (12) 
for those problems for which equation (12) would necessitate storing v( n ) as a large 
rectangular array as in optimal control problems. However, equation (14) implies that 
only the representation of p i needs to be stored with each iteration. 

The elements of the class of algorithms outlined are distinguished by the manner 
in which the parameter Q! n is chosen with each iteration. Davidon (ref. 2), Broyden 
(ref. 1), and Goldfarb (ref. 14) proposed techniques for choosing Qf n in the finite - 
dimensional case. For Davidon's rank-one variance algorithm, a n = 1 for all n; how- 
ever, the scalar r n given by equations (11) is chosen so that certain inequality con- 
straints are satisfied. These constraints insure that Davidon's V^ n ) remain positive 
definite. Goldfarb's optimal variance algorithm required that a n be chosen so that 
J(x n + ois n ) be minimized with respect to a. The Broyden quasi-Newton method 
requires only that a n be chosen so that (v( n )) - ^ exists and It is shown 

subsequently that for certain choices of v(0), both Davidon's and Goldfarb's methods of 
choosing a; n satisfy Broyden's criteria. 

THEOREMS INDEPENDENT OF CHOICE OF a n 

It is assumed that the functional to be minimized is quadratic as defined by equa- 
tions (1) and (2). However, the first three theorems of this section are proved without 
using this fact. Hence, the results of these theorems hold if J is any differentiable 
function. 
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Theorem 1: B( n ) as given in equation (13) is a linear self-adjoint, nonnegative, 

. operator for all n, for any choice of Q! n (nonnegative, that is, for all xeH 
(x,B( n )x) £ o)'. 

Proof: Clearly, B( n ) is linear by the linearity of the inner product. If x e H, 

then 

(x,B( n )x) = (x,(x,r n )r n ) = (x,r n ) 2 S 0 
and if x,y e H, then 

(x,B( n )y) = (x,(y,r n )r n ) = (y,r n )(x,r n ) = (y,(x,r n )r n ) = (y,B( n ) X ) 

Theorem 2: is self-adjoint for all n, for any choice of a n . 

n-1 

Proof: yW = V^) + ^ t^B^ by equation (14) and V^) is self-adjoint by defi- 

i=0 

nition. By this theorem, the B^ terms are self-adjoint and the finite sum of self- 
adjoint operators is self-adjoint. 

Theorem 3 (Basic theorem): If T n + 0 or r n = 6, then v( n+1 )(g* - g n ) = x* - x n ; 

that is, v( n+ l)y n = a n s n . 

Proof: If r n = 6, then v( n )y n = a n s n ^ definition. Also since T n = 0 by equa- 
tions (11) and (12), v( n+ l) = v( n ); thus, v( n+1 )y n = Q! n s n . Otherwise, consider 

v( n+1) y n - « n s n = V (n) y n + ^n(yn^n) r n ~ « n s n 

= v(n)y "-fe^ r "'“" s - 

= V^ n) y n " «n s n - r n = 6 


by definition of r n . 

The rank-one algorithms are Newtonlike in that the change in x at each iteration 
is given by a linear operator upon the gradient at x n . The name quasi-Newton was given 
to these algorithms by Broyden (ref. 1) because of the property of y( n+ l) given in theo- 
rem 3. Notice that for a quadratic functional such as equation (1), A - * also has this 
property, that is, A _1 (g*~ g n ) = x* - x n or 

A-1 y n = “n s n ( 15 ) 

Therefore, v( n+ *) and A~* agree on the vector g* - g n . 
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The next theorem and its corollary state that because of the manner in which B( n ) 
is defined, if A - * and v( n ) agree on a vector, then A - -*- and v( n+ l) also do. That 
is, information about the nature of A~* which \K n ) contains will be retained by v( n+ ^. 

Theorem 4: If ueH is such that A'^u = v( n )u and C:H — H is a linear oper- 
ator such that C = v( n ) + pB( n ) for some real p, then A~lu = Cu. 

Proof: Since A -1 (g* - g n ) = x* - x n and x* = x n - a n v( n )g n by definition, 
x * - x n = - a n v(n) Sn = A_1 (g* " gn) 

and 

r n = vW(g* - g n ) + «n v ^^n 

Therefore, r n = \K n )(g* - g n ) - A“^g* - g n ) and, hence, 

r n = (v( n ) - A-!)(g* - g n ) (16) 

Since A - lu = v( n )u, 

(vW - A _1 )u = 9 (17) 

Therefore 


( r m u ) = (( v ^ n) - A X )(g* " gn)> u ) = (g* - gn> ( v ^ n) - A = (g* - g n > #) = 0 

by theorem 2 and equations (16) and (17). Hence, the hypothesis (c - A _1 )u = pB^u 
implies that 

pB^u = p(u,r n )r n = p • 0 • r n = 9 
Therefore, Cu = A'^u. 

Since \K n+ ^ = + T n B^ n \ the following corollaries are obtained. 

Corollary 1: If \K n )u = A“^u for some u e H, then \K n+ l)u = A'-^u. 

Corollary 2 ( Fundamental property of v( n )): v( n )yi = aqs, = A - -^ for all i<n 
-if— -Tj * 0— for -j =-0,-1, — .,-n. 
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Proof: Note as previously observed (eq. (15)) since J is quadratic, A - -^ = c^Sp 
-From- theorem -3; 

v (1) y 0 = a o s o 

Assume that V^ n Vi = for all i < n. Consider V^ n+ ^y| for i = n. Since r n + 0 

by theorem 3, v( n+ ^Vn = ^nSn- Otherwise, for i < n, A - * and v( n ) agree on yp 
Corollary 1 implies that v( n+ l)y^ = a^sp Thus the corollary follows by mathematical 
induction. 


Corollary 2 is most useful in later convergence arguments and, hence, it has been 
named "the fundamental property of v( n )." 

By equation (15) and theorem 1, Goldfarb (ref, 14) observed that equation (12) can 
be expressed as 


V(n+!) = v( n ) - 

((v ( "> - A-l)y n ,y n ) 


if r n + 0; otherwise, \K n+1 ) = which yields the following theorem. 

Theorem 5: If V^) = A - *, then = A~* for all n and similarly, if 

v(°) g A' 1 , then v( n ) § A -1 for all n. 


Proof: Proof of this theorem is by mathematical induction. Assume that 

\K n ) ^ A - *. If \K n+ *) = V^ n ), that is, r n = 0, the result is trivial. Otherwise, by the 
preceding equation 


^x,(v( n+ l) - A *)xj = ^x,(v( n ) - A'*)xj 



and by equations (10) and (15), r n = (v^ n ^ - A - *)y n . The inductive hypothesis implies 
that v( n ) - A - * is a positive operator. Hence, by the Cauchy -Bunyakovski-Scharwz 
inequality (refs. 18 and 19), 




^ 0 


The second part of the theorem is obtained by merely considering ^x,(a * 
instead. 


v( n+1 )) 


x 
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The following theorem whose finite dimensional analog is due to Goldfarb (ref. 14) 
gives a condition under which the v( n ) operators form a monotone sequence of self- 
adjoint bounded linear operators. 

Theorem 6: If y(^) = A - '*', then A”^ S y( n ) = . . . = v(^) for all n. Similarly, 
if V(°) ^ A -1 , then A -1 1 v( n ) ^ . . . ^ y(°) for all n. 

Proof: By theorem 5, if v(^) 5 A - then y( n ) = A~* for all n. If y( n+ l) = y( n ), 
that is, r n = 0, then the assertion is obvious. Otherwise, 

;,(y( n +l) . yWU = . ( x ' B(n)x ) <„ 

(y a . M"> - A-^Vn) 

by equation (12). The inequality holds since theorem 1 gives (x,B( n )x) = 0 and from 
theorem 5, v( n ) - A'^ g 0. The second part of the theorem follows by considering 
v(°) - v(n+l) instead. 

Corollary 3: If V^=A - 1 or v(^) = A - *, then the y( n ) operators form a 
monotone sequence of strongly positive self-adjoint linear operators bounded 

by v(0) and A~l. Moreover, there exists a strongly positive self-adjoint 

operator V such that lim ||v( n )x - Vx|| = 0 for each x e H. 

n— °° 

Proof: The v( n ) operators form a bounded monotone sequence of strongly 
positive, self-adjoint operators by theorems 2 and 6; that is, if = A’*, then 

y(0) i y(l) ^ y(2) ^ . . . S y( n ) g . . . A - *. This relationship implies the existence 
of a strongly positive self-adjoint linear operator V such that v( n ) converges to V 
pointwise (ref. 18) and similarly, for V^) ^ A"-*-. 

Theorem 7: If r n * 0 for all n and if S is the space spanned by {yn}, then 

lim ||v( n )x = A - i-x|| for all x e S independent of the choice of the a n - ( B y 
n — 00 

the space spanned by a set M is meant the smallest linear manifold contain- 
ing M. ) 

Proof: Since x e S, this relationship means that x can be written as a finite lin- 
ear combination of y n vectors. Therefore there exist e R and some k such that 


k 



i=l 
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(Corollary 2) 


Let n > k, then 

k 

y(n) x = ^ ^ i V^ n )y l 
i=l 
k 

= I 

i=l 

= A_1 £ 
i=l 

= A _1 x 

Theorem 8: If v(®) = A~* or v(0) = A'* and r n * 0 for all n and the yj 

form a Schauder basis (ref. 20) for H, then vW — A~^ pointwise independent 
of the choice of <r n . 

Proof: For any x e H there exist ^ e R such that 


x = 



(18) 


since (yj} form a Schauder basis for H. Consider 


A _1 x - V( n )x = | 

|(a _ 1 - v( n ))x|| 

oo 


n-1 


OO 

= 

(a- 1 - v(n)) £ ^y. 

i=0 

< 

(A' 1 - V( n >) £ /3 iyi 
i=0 

+ 

(a- 1 - vw) y % yi 

L / 

i=n 


n-1 

By corollary 2, (a - * - V^) ^ 0jy^ = 9. Since = A~* or = A~^ by theo- 

i=0 


rem 6 and its corollary, it must be that 


1 1 vH I s 


-i 


or 


s||v(0>| . 


Therefore 


| |a~ 1 - V(n) 1 1 is bounded for all n, and by equation (18) it follows that the remainder 
must go to fcero, that is, 


1 Vi 

i=n 


— 0 as n — <*>. Therefore, lim ||A"^x - v( n ^x 

n— °° 


= 0. 
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Notice that all these results have been established without regard to the choice 
of o! n . As defined in equations (10), r n is a residual vector. The reason for this 
terminology will now be explained. 

Suppose r n = 6 for some n. Then v( n )y n = a n s n , and if (vfr))' 1 exists, 
y n = -a n (v( n )) -1 v( n )g n = -a n g n so that by definition y n = g* - g n = -a n g n . Since J 
is quadratic, g* = g n + Q! n As n . Therefore, a n As n = -a n g n so that s n = -A'ign. 

Hence, since s n = -v( n )g n , vWg n = A _1 g n . 

Recall that the minimum of J (quadratic as in eqs. (1) and (2)) is attained by 
x = x n - A“lg n . Step (2) of the basic algorithm is that if r n = 8, Q! n + 1; therefore, let 
a n = 1 and repeat step (1). Then the new x* is x* = x n - v( n )g n and it has been 
shown that v( n )g n = A"^g n . Therefore, x* is the location of the minimum of J. This 
result explains the reason for step (2), and the following theorem has been proved. 

Theorem 9: If r n = 8 and (V(n))- 1 exists, then by applying step (2) of the basic 
algorithm, Q! n is set equal to 1 and the resulting x* given by x* = x n - v( n )g n 
is found to be the location of the minimum of J. 

CONVERGENCE IF a n IS CHOSEN BY A ONE -DIMENSIONAL 
MINIMIZATION PROCESS 

There are two rather obvious ways to choose a n at each step: (1) let a n = 1 
for all n, and (2) let Q! n be such that J(x n + a^Sn) = J(x n + Xs n ) for all real X. Both 
cases have been investigated by Davidon and Goldfarb and convergence has been estab- 
lished in the case of a quadratic functional on a finite -dimensional Hilbert space. 

The convergence of the algorithm to the location of the minimum of a quadratic 
functional on an infinite -dimensional real Hilbert space when Q! n is chosen for every n 
so that 


j(x n +a n s n )^J(x n +\s n ) '■ (19) 

for all real X will now be shown. This relationship, of course, implies that x n+ j = x* 
in step (5) of the algorithm. If a; n is chosen in this manner, then by necessity 

dJ( x n + Xs n ) _ n 

dx 

at X = cv n . Hence, 


«n = - 


16 


( s n>&ri) 

^s n ,As n ) 


( 20 ) 



— Therefore, it must be that 


J(x„ + l) - J(xo) - i 


Since, inf J > -°°, 



which implies that by necessity 


lim 

i— oo 


( s i>gj) 2 

( s b As i) 


= 0 


( 21 ) 


Since its derivation in no way depended on the method of choosing si, equation (21) 
must be true for any descent method. This well-known result and the following lemma 
are given by Horwitz and Sarachik (ref. 10). They used their results to prove conver- 
gence of Davidon's first method, steepest descent, and the conjugate gradient method in an 
infinite -dimensional real Hilbert space for the problem under consideration, namely, min- 
imizing a quadratic functional as in equation (1). 

Lemma 1: If g n - 9 as n — °°, then x n converges in norm to the location of 
the minimum x = -A“^b. 

Proof: 0 = (^x n + A“lb,A(x n + A~lb)j = (x n + A _ lb,g n ) g |Jx n + A _: *-b|| |jg n ||. Now 

||x n + A“^b]j is bounded for all n, since for all n, x n is contained in a bounded set, 

namely, S^. = convex e H:J(x) S J(xq)} (ref. 15), the closed convex hull of the indicated 

set. Hence lim (x n + A - lb,A(x n + A _ lb)] = 0 and since A is strongly positive, 
n— «> v ' 

lim x n + A“^b = 0. 
n^°° 

A general convergence theorem for this case can now be proved. 

Theorem 10: If there exist positive reals a,f3 such that al = v( n ) = /3I for 
all n larger than some N and if a n is chosen as in equation (19) then 

lim ||x n + A"^b|| = 0, that is, x n converges in norm to the location of the 

n— oo 

minimum. 
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Proof: Since for all u e H, m||u|| 2 = (u,Au) = M||u|| 2 , 

1 < 1 < 1 
M j |u 1 1 2 (u, A u) - ml|u|| 2 

and since a | |u| | 2 i (u,v( n )u) ^ /3||u||2 for all n sufficiently large, 

1 < 1 < 1 

/3||u|| 2 (u, V^u) a ||u|| 2 

Since v( n ) is self-adjoint, l|v( n )u|| = /3 1 1 u 1 1 . Therefore, 

0 


Therefore, by equation (21) g n — 0 as n — °° and by lemma 1 x n — A _1 b in norm. 

The hypothesis of theorem 10 is more restrictive than necessary, only 
a ||g n || 2 = (g n ,v( n )g n ) = /3||g n || 2 is required. A result similar to that of theorem 10 
but more general can be found in reference 7. Theorem 5 can now be combined with 
theorem 10 to state a convergence theorem for the rank-one iterates. 

Corollary 4: If v(0) = A“1 or Vw) = A - * and a! n is chosen as in equation (19), 
then J(x n ) converges to the minimum of J(x), and moreover x n converges 
in norm to the location of the minimum. 

Proof: If v(0) ^ A - -*-, then by theorems 5 and 6, v(0) ^ v( n ) ^ A~1 for all n. 

Hence, mol = v( n ) S — I for all n. 

’ u m 

CONVERGENCE WITH A MORE GENERAL CHOICE OF a n 

In this section, as in the previous one, the convergence of rank -one iterates to the 
location of the minimum of a quadratic functional as in equations (1) and (2) is investigated. 
However, herein the step size Q! n need not be chosen by a one -dimensional minimization. 
Conditions on the initial estimate and the. a n scalars under which the iterates converge 
to x = -A" lb, the location of the minimum, are determined. 

Let denote the sequence of step sizes used by the rank-one algorithm. (From 

eq. (8), x* = X| + «jSj for each L) For each_ n.= 1, denote by -N(n) -that set of" integers 

less than n for .which j(jq - ajVWgi) < and the integer n - 1; that is, N(n) con- 

tains the indices of those step sizes which caused a decrease in the function being mini- 
mized and the integer n - 1. If y^ = g^ + j - then 

18 


( s m Sn) 2 > ( s n>Sn)^ _ (gn»^ n ^n) > (&n> V^&n) > q?2 H^nll _ g 2 n 

( Sn.ASn \ nr I I _ I 1 2 I I I i 2 nr ^2 II _ I I 2 Tl/ro2 1 1 | | 2 'n/r/l2 1 1 ^ 


MV' 


M ^i|g n ir 



g n = g 0 + Z y i ' ' ( 22 ) 

ieN(n) 

Hence, 

v (n)g n = V (n)g 0 + Y v(n) y . = V ( n )g 0 + £ (x i+1 - xj) (23) 

ieN(n) ieN(n) > 

From corollary 2, v( n )yj = = x^ + j - x^ for all i < n, and hence for all i e N(n). If 

X| + i ~ X| is denoted by m, then by step (5) of the algorithm and the definition of N(n) 

V 

x n = x 0 + Z CT i ’ ( 24 > 

ieN(n) 

From equations (8), (23), and (24), 

x * = x n - <*n v(n) g„ = x 0 + Z °i~ a n( v(n)g 0 + Z CT i) 

ieN(n) \ ieN(n) ) 

Hence, 

x* = x 0 - a n v( n )g 0 + (1 - a n ) ^ a i (25) 

i eN(n) 

Then by using equation (25) 

(1 - a„) J Ui 
ieN(n) 

In order to establish convergence, it must be shown that j|x* + A“^b[| can be 
made small as n — °°. Let = conv{x e H:J(x) ^ J(xq]}. Since it is known that 
is bounded (ref. 15), the following lemma can be proved. 

Lemma 2: If (o; n - l)n — 0 as n — °° and there exist a, (3 > 0 such that 

al ~ \K n ) = (31 and r n * 0 for all n, then (1 - a n ) o| - 0 as n — °° 

ieN(n) 
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Proof: By definition ||a^| | = j |a^S|| j = -Q!|VU)g^ . Therefore 

11^1 1 = |«i| | v( i )(Ax i +b) ^ |of i |{l|v( i )|| • 1 1 A| | || Xi || + ||vW|| -Hblj} (27) 

Since X{ e is a bounded set, J | xj_ 1 1 is bounded and since cq - 1 as i — is 

bounded. By hypothesis t|vU)|| S3 and || A 1 1 S M; thus, everything on the right-hand 
side of equation (27) is independent of i and ||cr^|| = L for some L = 0 and all i. 
Hence, 

(l - or n ) ^ o- t = (l - «n) n * L - 0 
ieN(n) 

since by hypothesis (a n - l)n — 0 as n — °°. 

Lemma 3: Suppose either gQ is an element of the smallest subspace containing the 
y^ vectors denoted by S(y^ or the y^ vectors form a Schauder basis for H. 

If the v( n ) operators are uniformly bounded, a n — 1 as n °°, and T n # 0 
for all n, then J (a - - 1 - - Q! n v( n ))gQjj - 0 as n - 

Proof: Write of n as 1 - e n ; then since o? n 1 as n — °°, ^ - 0 as n — °°. 
Therefore, 

(A' 1 - a? n v( n ))g 0 = A’^q - v( n )g 0 +e n v( n )g 0 i A -1 g 0 - v( n )g Q + |e n v( n )g Q 
Hence because of the hypothesis on gQ, lim A“^g 0 - v( n )gQ = 0 from theorem 7 or 8. 

n— oo 

Also since the v( n ) operators are uniformly bounded, v( n )gQ must be bounded for 
all n. Therefore lim |e n | v( n )g Q = 0. Hence the lemma is proved. 

n— °o ' 

The following statement can now be made. 

Theorem 11: Suppose g Q e S(y 4 ) or the y^^ vectors form a Schauder basis for H. 
If T n ± 0 for all n, the v( n ) are uniformly bounded, and if (a n - l)n — 0 as 
n, — then ||x* + A _ ^b|| — 0 as n - °°. 

Proof: By equation (26) 

x*+A~ 1 b s (A -1 - a n v( n ))g 0 || + (1 - a n ) ^ 

ieN(n) 

and by lemma 3 the first term goes to zero. By lemma 2 the second term goes to zero. 
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Hence, conditions under which two variations of the basic algorithm converge to the 
location of the minimum of a quadratic functional have been established. These condi- 
tions are given in theorems 10 and 11. In both of these theorems, the primary interest 
is in the convergence question for an inf inite -dimensional Hilbert space. In a finite- 
dimensional space of dimension n, it can be seen that for almost any collection of non- 
zero Q! n scalars, the algorithm converges to the location of the minimum in a finite 
number of steps. The conditions on the a n scalars and the proof are given in the fol- 
lowing theorem from Broyden (ref. 1). 

Theorem 12: If Tj + 0 and aj + 0 for all j = 0, 1, . . . and if (V(i )) * exists 
for all j, then after at most q + 1 steps x* = -A"*b where q = dim(H). 

APPLICATION TO OPTIMAL CONTROL THEORY 

Fixed -time problems will be considered since by a simple transformation (ref. 17), 
the free-time problem can be transformed into a fixed-time problem. Moreover, Horwitz 
and Sarachik (ref. 21) have given several other schemes for solving the free-time prob- 
lem by using fixed-time techniques, and these schemes are applicable when the Davidon- 
Broyden algorithms are used. 

A Quadratic Payoff With Linear Constraining Differential Equations 

From the class o£^(to,ti), the problem is to find a function u(t) which minimizes 



subject to the constraints 

x(t) = G(t) x(t) + B(t) u(t) (29) 

and x(tg) = xg where xq, tg, and tj are fixed. For this problem, 


X 

n vector 


u 

r vector 


G(t) 

n x n matrix with components in 

l 1 (Vi) 

B(t) 

n x r matrix with components in 

L !( tg, 1 1) and bounded 


2-1 



p(t) n x n symmetric, positive semidefinite matrix the components of which are 

piecewise continuous on (tQ,tjj 

R(t) r x r symmetric, uniformly positive definite matrix the components of which 

are piecewise continuous on (tQ,t i) 

Horwitz and Sarachik (ref. 10) have shown that this problem can be considered as 

2 

that of finding the location of the minimum of a quadratic functional on This 

function is exactly the type of function for which the conditions given in theorems 10 and 11 
guarantee the convergence of the various modifications of the basic algorithm. 


General Optimal Control Problems and Gradient of Payoff 

In this section a class of problems generally referred to as optimal control prob- 
lems (ref. 22) or in the calculus of variations as Lagrange problems (ref. 17) is described. 
Also presented formally are the mechanics of applying the algorithms discussed to com- 
pute solutions to these problems, even though these functions generally are not quadratic. 

Given a system of n differential equations such as 

x(t) = f(x,u,t) (30) 


with x(tq) = xq and u e R r , find that function u = u(t) which minimizes the value of 

ptf 

\ L(x(t),u(t),t) dt. Assume that f(x,u,t) and L(x,u,t) have continuous partial deriva- 

Jt o 

tives of at least second order in x and u and are piecewise continuous in t. Also, 
assume that there are no constraints on u or x, other than that x must satisfy 
equation (30). 

Moreover, it will be assumed that L and f are such that corresponding to every 
u = u(t) e °£^(tq,tij, a real Hilbert space, there exists a solution, x = x(t), of equation (30) 

pti 

and that for this x and u the integral \ L(x(t),u(t),t) dt, exists. Hence, the func- 

Jt 0 

tional J:H — R can be defined by 



where x(t) is a solution of equation (30) corresponding to u. 
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Therefore, the problem is that of locating the minimum of a functional J on a real 
Hilbert' space — H: — Ini3rder to - apply the - algorithms' diScussedearireb, the grhdient'bf J 
must be computed. The gradient of J is that part of J^u + 6 U ) - J(u) which is linear 
in Su- 


it can be shown (ref. 23) that if the Hamiltonian is defined as 


H(x,X,t,u) = L(x,u,t) + X T f(x,u,t) 


(31) 


then the gradient of J at u is given by 


g(u) = V U H = 


L u (x(t),u(t),t) + X T f u (x(t),u(t),t) 


oT 


(32) 


where 


X = f(x(t),u(t),t) = |S (x(t 0 ) = x 0 ) 

X = - |S(x(t),X(t),t,u(t)) (x(ti) = o) 

The steps necessary to compute the gradient of J at u = Ug(t) are (1) numerically 
integrate x = f(x,u 0 ,t) with x(tg) = xg forward to t = tj, and (2) at t = ti integrate 
(again numerically) 

X = -fj(x,u 0 ,t)x - Lj(x,u 0 ,t) 

with x(t]J = 0 backward to t = tg. Therefore, the gradient as given in equation (32) can 
be computed by using the control u = ug(t) and the values of x(t) and x(t) previously 
computed. In practice, uq is usually expressed in tabular form. H the gradient is 
computed according to equation (32), then and r n can be computed as in equa- 

tions (13) and (10). Hence, the algorithms outlined can be used to generate a sequence of 
controls which hopefully converge to the optimal. The practical significance of theorem 11 
is to demonstrate that the rank-one algorithms are not as sensitive to the accuracy of the 
one -dimensional minimization as are the other widely used gradient algorithms, for exam- 
ple, conjugate gradient and Davidon- Fletcher -Powell (DFP) methods. For minimizing 
many functions (not necessarily quadratic) by these two algorithms, a large part of the 
computing time is required to solve the one -dimensional minimization problem repeatedly. 

EXAMPLE PROBLEM AND RESULTS OF ANALYSIS 

In order to exhibit the convergence characteristics of the rank-one algorithms, a 
sample optimal control problem which others have used to display convergence character - 
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istics of other algorithms (refs. 8, 9, and 11) has been chosen. The problem is to find 
the function u = u(t) which minimizes 

J = y (x^ + x| + u 2 j dt (33) 

subject to constraining differential equations described by the Van der Pol equation 
(ref. 24) with e = 1, that is, 



with initial conditions 


x x (0) = 3.0 
x 2 (0) = 0.0 

By equation (32) the gradient g of J at u is given. by 
g(t) = 2u(t) + X2(t) 


where 


Xj = [l + 2 xjX 2 ^X2 - 2xj 

x 2 = -x x - (l - xf)x 2 - 2 x 2 


(35) 


(36) 


with 


X X (5) = 0 
X 2 (5) = 0 

In order to compute the gradient g(t) of J at some t, equations 34 are inte- 
grated forward to t = 5.0 by using u - Ug(t).- -Next,- equations (36) are integrated from 
t = 5.0 back to t = 0.0. Then by using u = ug(t) and the computed value of X 2 (t), 
g(t) given by equation (35) can be computed. 
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Figures 1 and 2 depict the progress toward the minimum of J by using the algo- 
rithm outlined with four different methods of choosing -a n .~ These-four methods of 

choosing Q! n are: 

(1) a n = 1 - (n^ + 2) for all n. 

(2) Q! n = 1 for all n. 

(3) a; n = min /l.O, [j m - J(u n )] /(s n ,g n )j where J m is the estimated minimum 
value of J, s n is defined by equation (9), and g n is the gradient of J at u = u n (t). 

(4) a n is the minimum with respect to 01 of J(u n + Q!s n ) as computed by 
Davidon's one -dimensional cubic minimization method (ref. 2). 

Methods (1) and (2) of choosing a! n satisfy the condition given in theorem 11 (for 
these methods (l - a! n ^n — 0 as n — °°). The form of a n for method (3) follows by 
considering J m = J(u n ) + an(sn,gn) + higher order terms, dropping the higher order 
terms, and solving for a n, where J m is some nominal estimate of the minimum 
of J along the line un + ot s n . 

Notice that methods (1), (2), and (3) of choosing a n involve no extra functional 
and gradient evaluations; that is, for each iteration equations (34) and (36) must be inte- 
grated only once.. The fourth method of choosing Q! n , although the one -dimensional min- 
imum is computed more accurately than by method (3), involves at least one more func- 
tional evaluation per iteration. Hence, with the fourth method of choosing a n , at least 
two functional and gradient evaluations per iteration are required. 

In figure 2, J(u n ) is plotted against n (that is, the iteration number) for the four 
different methods of choosing a n . Figure 2 shows that the fastest convergence in terms 
of iterations is achieved by the algorithm with a n chosen by method (4). Also, figure 2 
shows that after 12 iterations, all the methods have converged. Moreover, after eight 
iterations for all methods of choosing a n , the change in the value of J is too small to 
show up in the graph. 

In figure 3, J is plotted against the number of functional evaluations. Notice that 
in figure 3, methods (3) and (1) converge faster with respect to function evaluations than 
method (4). Note also that after at most eight functional evaluations, the change in J is 
too small to be noticed in the graph. 

Figure 4 shows the rates of convergence to the minimum for the example problem 
for the three first-order methods (steepest descent, conjugate gradient, and DFP). These 
results were reported by Tokumaru, et al. (ref. 11). Note that the DFP algorithm shows 
the fastest rate of convergence. 

By using the same initial estimate of u that was used for the results shown in 
figure 2, the DFP method was applied to the example problem. The results for the DFP 
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Figure 3-- Rank-one performance for various step size criteria. 



J 


Figure 


27.2 



Number of iterations 

Figure 4 .- Comparison of previous methods. From reference 11. 



Number of functional evaluations 


5.- Eahk-one performance compared with Davidon-Fletcher- Powell (DFP) method. 
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method were identical to those of the rank-one algorithm with a n chosen by method (4). 
The reduction in the payoff and the iterates for the two methods were the same. 

In figure 5, the values of J(u n ) have been plotted against the number of function 
evaluations for the DFP method and the algorithm when a< n is chosen by method (3). 
Notice that in terms of function evaluations, the rank-one method for this choice of a n 
converges faster than the DFP algorithm. The linear minimizations for the DFP algo- 
rithm were carried out by method (4). This method was chosen because high accuracy 
in the linear minimization is necessary for the DFP method. 

The rank-one algorithms outlined herein have several attractive properties. Theo- 
rem 7 gives conditions under which v( n ) — A - * pointwise where v( n ) is given by equa- 
tion (12) and A is given by equation (2). This property can be used to accelerate the 
convergence when many solutions corresponding to different initial conditions are desired. 
This property is not available with the method of conjugate gradients. Moreover, these 
rank-one algorithms require one -half the storage necessary for the DFP method. Also, 
rank -one methods require the computation of only one operator per iteration whereas the 
computation of two operators are required per DFP iteration. 

The results of the example problem show that the algorithm can be applied with suc- 
cess when a n is chosen in a variety of ways. It appears that method (3) of choosing a n 
is best when the functional to be evaluated is very complex, its computation is time con- 
suming, and storage considerations are not as important. If storage considerations are 
pressing and the computation of the functional is not as time consuming, then method (4) 
would seem to be the best choice for a n . 

CONCLUDING REMARKS 

The extension to a real Hilbert space of the rank -one minimization algorithms has 
been presented. Conditions insuring the convergence to the location of the minimum of 
a quadratic functional are given for various techniques of choosing the step size. Also 
the application of these algorithms for the direct computation of optimal controls has been 
outlined. An example optimal control problem has been solved by using several rank-one 
algorithms. The example problem shows that rank-one algorithms give promise of a fast 
rate of convergence, comparable to that for the Davidon- Fletcher -Powell (DFP) method. 
Moreover, the rank-one techniques use less storage and involve fewer computations than 
the DFP method. Also, the example problem illustrates that the one -dimensional search 
required by DFP is not critical for convergence when a rank-one algorithm is used. In 
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fact, for the example problem, convergence to the location of the minimum was achieved 
by-using four- different methods for choosing the step size. Two of the choices for step 
size used in the example were independent of the function values and yet convergence 
was achieved. 

Langley Research Center, 

National Aeronautics and Space Administration, 

Hampton, Va., October 26, 1972. 
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